Analysis of GLDS-38 from NASA GeneLab

This R markdown file was auto-generated by the iDEP website Using iDEP 0.91, originally by Steven

Ge SX, Son EW, Yao R: iDEP: an integrated web application for differential expression and pathway analysis of RNA-Seq data. BMC Bioinformatics 2018, 19(1):534. PMID:30567491

1. Read data

First we set up the working directory to where the files are saved.

 setwd('~/Documents/HTML_R/GLDS38')

R packages and iDEP core Functions. Users can also download the iDEP_core_functions.R file. Many R packages needs to be installed first. This may take hours. Each of these packages took years to develop.So be a patient thief. Sometimes dependencies needs to be installed manually. If you are using an older version of R, and having trouble with package installation, try un-install the current version of R, delete all folders and files (C:/Program Files/R/R-3.4.3), and reinstall from scratch.

 if(file.exists('iDEP_core_functions.R'))
    source('iDEP_core_functions.R') else 
    source('https://raw.githubusercontent.com/iDEP-SDSU/idep/master/shinyapps/idep/iDEP_core_functions.R') 

We are using the downloaded gene expression file where gene IDs has been converted to Ensembl gene IDs. This is because the ID conversion database is too large to download. You can use your original file if your file uses Ensembl ID, or you do not want to use the pathway files available in iDEP (or it is not available).

 inputFile <- 'GLDS38_Expression.csv'
 sampleInfoFile <- 'GLDS38_Sampleinfo.csv' 
 gldsMetadataFile <- 'GLDS38_Metadata.csv'
 geneInfoFile <- 'Arabidopsis_thaliana__athaliana_eg_gene_GeneInfo.csv' #Gene symbols, location etc. 
 geneSetFile <- 'Arabidopsis_thaliana__athaliana_eg_gene.db'  # pathway database in SQL; can be GMT format 
 STRING10_speciesFile <- 'https://raw.githubusercontent.com/iDEP-SDSU/idep/master/shinyapps/idep/STRING10_species.csv' 

Parameters for reading data

 input_missingValue <- 'geneMedian' #Missing values imputation method
 input_dataFileFormat <- 1  #1- read counts, 2 FKPM/RPKM or DNA microarray
 input_minCounts <- 0.5 #Min counts
 input_NminSamples <- 1 #Minimum number of samples 
 input_countsLogStart <- 4  #Pseudo count for log CPM
 input_CountsTransform <- 1 #Methods for data transformation of counts. 1-EdgeR's logCPM 2-VST, 3-rlog 
readMetadata.out <- readMetadata(gldsMetadataFile)
library(knitr)   #  install if needed. for showing tables with kable
library(kableExtra)
kable( readMetadata.out ) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%")
FLT_Rep1 FLT_Rep2 FLT_Rep3 GC_Rep1 GC_Rep2 GC_Rep3 LN2_Rep1 LN2_Rep2 LN2_Rep3 RNAlat_Rep1 RNAlat_Rep2 RNAlat_Rep3
Sample.LongId Atha.WT.Col.0.sl.FLT.Rep1.G1S1.RNAseq.RNAseq Atha.WT.Col.0.sl.FLT.Rep2.G1S2.RNAseq.RNAseq Atha.WT.Col.0.sl.FLT.Rep3.G1S3.RNAseq.RNAseq Atha.WT.Col.0.sl.GC.Rep1.G2S1.RNAseq.RNAseq Atha.WT.Col.0.sl.GC.Rep2.G2S2.RNAseq.RNAseq Atha.WT.Col.0.sl.GC.Rep3.G2S3.RNAseq.RNAseq Atha.WT.Col.0.sl.LN2.Rep1.n2.1.RNAseq.RNAseq Atha.WT.Col.0.sl.LN2.Rep2.n2.2.RNAseq.RNAseq Atha.WT.Col.0.sl.LN2.Rep3.n2.3.RNAseq.RNAseq Atha.WT.Col.0.sl.RNAlat.Rep1.rl1.RNAseq.RNAseq Atha.WT.Col.0.sl.RNAlat.Rep2.rl2.RNAseq.RNAseq Atha.WT.Col.0.sl.RNAlat.Rep3.rl3.RNAseq.RNAseq
Sample.Id Atha.WT.Col.0.sl.FLT.Rep1.G1S1 Atha.WT.Col.0.sl.FLT.Rep2.G1S2 Atha.WT.Col.0.sl.FLT.Rep3.G1S3 Atha.WT.Col.0.sl.GC.Rep1.G2S1 Atha.WT.Col.0.sl.GC.Rep2.G2S2 Atha.WT.Col.0.sl.GC.Rep3.G2S3 Atha.WT.Col.0.sl.LN2.Rep1.n2.1 Atha.WT.Col.0.sl.LN2.Rep2.n2.2 Atha.WT.Col.0.sl.LN2.Rep3.n2.3 Atha.WT.Col.0.sl.RNAlat.Rep1.rl1 Atha.WT.Col.0.sl.RNAlat.Rep2.rl2 Atha.WT.Col.0.sl.RNAlat.Rep3.rl3
Sample.Name Atha_WT-Col-0_sl_FLT_Rep1_G1S1 Atha_WT-Col-0_sl_FLT_Rep2_G1S2 Atha_WT-Col-0_sl_FLT_Rep3_G1S3 Atha_WT-Col-0_sl_GC_Rep1_G2S1 Atha_WT-Col-0_sl_GC_Rep2_G2S2 Atha_WT-Col-0_sl_GC_Rep3_G2S3 Atha_WT-Col-0_sl_LN2_Rep1_n2-1 Atha_WT-Col-0_sl_LN2_Rep2_n2-2 Atha_WT-Col-0_sl_LN2_Rep3_n2-3 Atha_WT-Col-0_sl_RNAlat_Rep1_rl1 Atha_WT-Col-0_sl_RNAlat_Rep2_rl2 Atha_WT-Col-0_sl_RNAlat_Rep3_rl3
GLDS 38 38 38 38 38 38 38 38 38 38 38 38
Accession GLDS-38 GLDS-38 GLDS-38 GLDS-38 GLDS-38 GLDS-38 GLDS-38 GLDS-38 GLDS-38 GLDS-38 GLDS-38 GLDS-38
Hardware BRIC BRIC BRIC BRIC BRIC BRIC BRIC BRIC BRIC BRIC BRIC BRIC
Tissue Etiolated seedling Etiolated seedling Etiolated seedling Etiolated seedling Etiolated seedling Etiolated seedling Etiolated seedling Etiolated seedling Etiolated seedling Etiolated seedling Etiolated seedling Etiolated seedling
Age 8 days 8 days 8 days 8 days 8 days 8 days 8 days 8 days 8 days 8 days 8 days 8 days
Organism Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana
Ecotype Col-0 Col-0 Col-0 Col-0 Col-0 Col-0 Col-0 Col-0 Col-0 Col-0 Col-0 Col-0
Genotype WT WT WT WT WT WT WT WT WT WT WT WT
Variety Col-0 WT Col-0 WT Col-0 WT Col-0 WT Col-0 WT Col-0 WT Col-0 WT Col-0 WT Col-0 WT Col-0 WT Col-0 WT Col-0 WT
Radiation Cosmic radiation Cosmic radiation Cosmic radiation Background Earth Background Earth Background Earth Background Earth Background Earth Background Earth Background Earth Background Earth Background Earth
Gravity Microgravity Microgravity Microgravity Terrestrial Terrestrial Terrestrial Terrestrial Terrestrial Terrestrial Terrestrial Terrestrial Terrestrial
Developmental Etiolated 8 day old seedlings Etiolated 8 day old seedlings Etiolated 8 day old seedlings Etiolated 8 day old seedlings Etiolated 8 day old seedlings Etiolated 8 day old seedlings Etiolated 8 day old seedlings Etiolated 8 day old seedlings Etiolated 8 day old seedlings Etiolated 8 day old seedlings Etiolated 8 day old seedlings Etiolated 8 day old seedlings
Time.series.or.Concentration.gradient Single time point Single time point Single time point Single time point Single time point Single time point Single time point Single time point Single time point Single time point Single time point Single time point
Light Dark Dark Dark Dark Dark Dark Dark Dark Dark Dark Dark Dark
Assay..RNAseq. RNAseq Transcription and Proteomic Profiling RNAseq Transcription and Proteomic Profiling RNAseq Transcription and Proteomic Profiling RNAseq Transcription and Proteomic Profiling RNAseq Transcription and Proteomic Profiling RNAseq Transcription and Proteomic Profiling RNAseq Transcription and Proteomic Profiling RNAseq Transcription and Proteomic Profiling RNAseq Transcription and Proteomic Profiling RNAseq Transcription and Proteomic Profiling RNAseq Transcription and Proteomic Profiling RNAseq Transcription and Proteomic Profiling
Temperature 22-24 22-24 22-24 22-24 22-24 22-24 22-24 22-24 22-24 22-24 22-24 22-24
Treatment.type Proteomics and Transcriptomics analysis of Arabidopsis Seedlings in Microgravity Proteomics and Transcriptomics analysis of Arabidopsis Seedlings in Microgravity Proteomics and Transcriptomics analysis of Arabidopsis Seedlings in Microgravity Proteomics and Transcriptomics analysis of Arabidopsis Seedlings in Microgravity Proteomics and Transcriptomics analysis of Arabidopsis Seedlings in Microgravity Proteomics and Transcriptomics analysis of Arabidopsis Seedlings in Microgravity Proteomics and Transcriptomics analysis of Arabidopsis Seedlings in Microgravity Proteomics and Transcriptomics analysis of Arabidopsis Seedlings in Microgravity Proteomics and Transcriptomics analysis of Arabidopsis Seedlings in Microgravity Proteomics and Transcriptomics analysis of Arabidopsis Seedlings in Microgravity Proteomics and Transcriptomics analysis of Arabidopsis Seedlings in Microgravity Proteomics and Transcriptomics analysis of Arabidopsis Seedlings in Microgravity
Treatment.intensity X X X X X X X X X X X X
Treament.timing X X X X X X X X X X X X
Preservation.Method. RNAlater RNAlater RNAlater RNAlater RNAlater RNAlater Liquid Nitrogen Liquid Nitrogen Liquid Nitrogen RNAlater RNAlater RNAlater
 readData.out <- readData(inputFile) 
## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors
   kable( head(readData.out$data) ) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%") 
FLT_Rep1 FLT_Rep2 FLT_Rep3 GC_Rep1 GC_Rep2 GC_Rep3 LN2_Rep1 LN2_Rep2 LN2_Rep3 RNAlat_Rep1 RNAlat_Rep2 RNAlat_Rep3
ATCG00490 19.02600 20.20307 20.47902 19.35267 20.26267 19.81133 17.85944 18.14881 18.28268 19.90399 19.24373 18.83189
AT2G41310 18.59846 19.97237 19.85194 20.18207 18.98753 19.11273 18.29485 18.20631 18.20418 18.69872 18.47971 18.18740
ATCG00020 17.76784 18.79161 19.31738 17.92858 19.00300 18.51242 17.11173 17.35404 17.37928 18.97660 18.68141 18.13132
AT3G21720 17.38043 17.29131 17.40174 18.45645 16.52706 17.41765 16.35671 15.58054 15.69393 14.72961 14.46187 14.90556
AT2G07671 16.80806 17.55228 17.74476 16.89524 17.38578 17.32608 15.95185 15.89930 15.66004 16.40474 16.05359 15.88383
ATCG00280 16.09754 17.02548 17.38910 17.07244 17.48614 17.20404 15.47935 15.34283 15.45231 17.24330 16.73648 16.46234
 readSampleInfo.out <- readSampleInfo(sampleInfoFile) 
 kable( readSampleInfo.out ) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%") 
Gravity Preservation.Method.
FLT_Rep1 Microgravity RNAlater
FLT_Rep2 Microgravity RNAlater
FLT_Rep3 Microgravity RNAlater
GC_Rep1 Terrestrial RNAlater
GC_Rep2 Terrestrial RNAlater
GC_Rep3 Terrestrial RNAlater
LN2_Rep1 Terrestrial Liquid Nitrogen
LN2_Rep2 Terrestrial Liquid Nitrogen
LN2_Rep3 Terrestrial Liquid Nitrogen
RNAlat_Rep1 Terrestrial RNAlater
RNAlat_Rep2 Terrestrial RNAlater
RNAlat_Rep3 Terrestrial RNAlater
 input_selectOrg ="NEW" 
 input_selectGO <- 'GOBP'   #Gene set category 
 input_noIDConversion = TRUE  
 allGeneInfo.out <- geneInfo(geneInfoFile) 
 converted.out = NULL 
 convertedData.out <- convertedData()    
 nGenesFilter()  
## [1] "16156 genes in 12 samples. 16117  genes passed filter.\n Original gene IDs used."
 convertedCounts.out <- convertedCounts()  # converted counts, just for compatibility 

2. Pre-process

# Read counts per library 
 parDefault = par() 
 par(mar=c(12,4,2,2)) 
 # barplot of total read counts
 x <- readData.out$rawCounts
 groups = as.factor( detectGroups(colnames(x ) ) )
 if(nlevels(groups)<=1 | nlevels(groups) >20 )  
  col1 = 'green'  else
  col1 = rainbow(nlevels(groups))[ groups ]             
         
 barplot( colSums(x)/1e6, 
        col=col1,las=3, main="Total read counts (millions)")  

 readCountsBias()  # detecting bias in sequencing depth 
## [1] 0.02579404
## [1] 0.1547212
## [1] 0.05122098
## [1] "Warning! Sequencing depth bias detected. Total read counts are significantly different among sample groups (p= 2.58e-02 ) based on ANOVA."
 # Box plot 
 x = readData.out$data 
 boxplot(x, las = 2, col=col1,
    ylab='Transformed expression levels',
    main='Distribution of transformed data') 

 #Density plot 
 par(parDefault) 
## Warning in par(parDefault): graphical parameter "cin" cannot be set
## Warning in par(parDefault): graphical parameter "cra" cannot be set
## Warning in par(parDefault): graphical parameter "csi" cannot be set
## Warning in par(parDefault): graphical parameter "cxy" cannot be set
## Warning in par(parDefault): graphical parameter "din" cannot be set
## Warning in par(parDefault): graphical parameter "page" cannot be set
 densityPlot()       

 # Scatter plot of the first two samples 
 plot(x[,1:2],xlab=colnames(x)[1],ylab=colnames(x)[2], 
    main='Scatter plot of first two samples') 

 ####plot gene or gene family
 input_selectOrg ="BestMatch" 
 input_geneSearch <- 'HOXA' #Gene ID for searching 
 genePlot()  
## NULL
 input_useSD <- 'FALSE' #Use standard deviation instead of standard error in error bar? 
 geneBarPlotError()       
## NULL

3. Heatmap

 # hierarchical clustering tree
 x <- readData.out$data
 maxGene <- apply(x,1,max)
 # remove bottom 25% lowly expressed genes, which inflate the PPC
 x <- x[which(maxGene > quantile(maxGene)[1] ) ,] 
 plot(as.dendrogram(hclust2( dist2(t(x)))), ylab="1 - Pearson C.C.", type = "rectangle") 

 #Correlation matrix
 input_labelPCC <- TRUE #Show correlation coefficient? 
 correlationMatrix() 

 # Parameters for heatmap
 input_nGenes <- 1000   #Top genes for heatmap
 input_geneCentering <- TRUE    #centering genes ?
 input_sampleCentering <- FALSE #Center by sample?
 input_geneNormalize <- FALSE   #Normalize by gene?
 input_sampleNormalize <- FALSE #Normalize by sample?
 input_noSampleClustering <- FALSE  #Use original sample order
 input_heatmapCutoff <- 4   #Remove outliers beyond number of SDs 
 input_distFunctions <- 1   #which distant funciton to use
 input_hclustFunctions <- 1 #Linkage type
 input_heatColors1 <- 1 #Colors
 input_selectFactorsHeatmap <- 'Gravity'    #Sample coloring factors 
 png('heatmap.png', width = 10, height = 15, units = 'in', res = 300) 
 staticHeatmap() 
 dev.off()  
## png 
##   2

[heatmap] (heatmap.png)

 heatmapPlotly() # interactive heatmap using Plotly 

4. K-means clustering

 input_nGenesKNN <- 2000    #Number of genes fro k-Means
 input_nClusters <- 4   #Number of clusters 
 maxGeneClustering = 12000
 input_kmeansNormalization <- 'geneMean'    #Normalization
 input_KmeansReRun <- 0 #Random seed 

 distributionSD()  #Distribution of standard deviations 

 KmeansNclusters()  #Number of clusters 

 Kmeans.out = Kmeans()   #Running K-means 
 KmeansHeatmap()   #Heatmap for k-Means 

 #Read gene sets for enrichment analysis 
 sqlite  <- dbDriver('SQLite')
 input_selectGO3 <- 'GOBP'  #Gene set category
 input_minSetSize <- 15 #Min gene set size
 input_maxSetSize <- 2000   #Max gene set size 
 GeneSets.out <-readGeneSets( geneSetFile,
    convertedData.out, input_selectGO3,input_selectOrg,
    c(input_minSetSize, input_maxSetSize)  )  
 # Alternatively, users can use their own GMT files by
 #GeneSets.out <- readGMTRobust('somefile.GMT')  
 results <- KmeansGO()  #Enrichment analysis for k-Means clusters   
 results$adj.Pval <- format( results$adj.Pval,digits=3 )
 kable( results, row.names=FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%") 
Cluster adj.Pval Genes Pathways
A 7.99e-122 219 Organonitrogen compound biosynthetic process
2.84e-118 155 Translation
4.26e-118 155 Peptide biosynthetic process
1.81e-116 159 Amide biosynthetic process
3.07e-113 156 Peptide metabolic process
1.22e-106 160 Cellular amide metabolic process
5.05e-51 157 Response to abiotic stimulus
4.90e-42 57 Photosynthesis
5.59e-35 50 Response to cytokinin
7.30e-34 63 Generation of precursor metabolites and energy
B 9.12e-48 81 Cell wall organization or biogenesis
9.12e-48 73 Cell wall organization
2.49e-47 74 External encapsulating structure organization
1.20e-40 128 Response to abiotic stimulus
2.58e-36 74 Drug metabolic process
8.64e-36 111 Small molecule metabolic process
2.30e-34 82 Carbohydrate metabolic process
2.27e-32 77 Response to inorganic substance
6.60e-28 42 Plant-type cell wall organization or biogenesis
1.13e-27 51 Polysaccharide metabolic process
C 3.98e-50 79 Response to external stimulus
1.34e-42 63 Response to external biotic stimulus
1.34e-42 63 Response to other organism
2.33e-42 63 Response to biotic stimulus
4.56e-40 42 Secondary metabolic process
1.25e-38 63 Defense response
2.03e-38 67 Multi-organism process
2.35e-35 78 Response to abiotic stimulus
4.27e-34 26 Indole-containing compound metabolic process
1.99e-33 49 Defense response to other organism
D 1.20e-50 151 Response to abiotic stimulus
1.95e-47 127 Response to oxygen-containing compound
2.73e-46 98 Response to inorganic substance
1.10e-39 101 Response to acid chemical
2.11e-39 113 Cellular response to chemical stimulus
6.77e-37 126 Response to organic substance
1.17e-31 107 Response to hormone
4.57e-31 107 Response to endogenous stimulus
3.63e-27 75 Response to external biotic stimulus
3.63e-27 75 Response to other organism
 input_seedTSNE <- 0    #Random seed for t-SNE
 input_colorGenes <- TRUE   #Color genes in t-SNE plot? 
 tSNEgenePlot()  #Plot genes using t-SNE 

5. PCA and beyond

 input_selectFactors <- 'Gravity'   #Factor coded by color
 input_selectFactors2 <- 'Preservation.Method.' #Factor coded by shape
 input_tsneSeed2 <- 0   #Random seed for t-SNE 
 #PCA, MDS and t-SNE plots
 PCAplot()  

 MDSplot() 

 tSNEplot()  

 #Read gene sets for pathway analysis using PGSEA on principal components 
 input_selectGO6 <- 'GOBP' 
 GeneSets.out <-readGeneSets( geneSetFile,
    convertedData.out, input_selectGO6,input_selectOrg,
    c(input_minSetSize, input_maxSetSize)  )  
 PCApathway() # Run PGSEA analysis 
## Warning: Package 'KEGG.db' is deprecated and will be removed from Bioconductor
##   version 3.12

 cat( PCA2factor() )   #The correlation between PCs with factors 
## 
##  Correlation between Principal Components (PCs) with factors
## PC1 is correlated with Gravity (p=3.91e-02).

6. DEG1

 input_CountsDEGMethod <- 2 #DESeq2= 3,limma-voom=2,limma-trend=1 
 input_limmaPval <- 0.1 #FDR cutoff
 input_limmaFC <- 2 #Fold-change cutoff
 input_selectModelComprions <- 'Gravity: Microgravity vs. Terrestrial'  #Selected comparisons
 input_selectFactorsModel <- 'Gravity'  #Selected comparisons
 input_selectInteractions <- NULL   #Selected comparisons
 input_selectBlockFactorsModel <- NULL  #Selected comparisons
 factorReferenceLevels.out <- c('Gravity:Terrestrial') 

 limma.out <- limma()
 DEG.data.out <- DEG.data()
 limma.out$comparisons 
## [1] "Microgravity-Terrestrial"
 input_selectComparisonsVenn = limma.out$comparisons[1:3] # use first three comparisons
 input_UpDownRegulated <- FALSE #Split up and down regulated genes 
 vennPlot() # Venn diagram 

  sigGeneStats() # number of DEGs as figure 

  sigGeneStatsTable() # number of DEGs as table 
##                                       Comparisons  Up Down
## Microgravity-Terrestrial Microgravity-Terrestrial 229  904

7. DEG2

 input_selectContrast <- 'Microgravity-Terrestrial' #Selected comparisons 
 selectedHeatmap.data.out <- selectedHeatmap.data()
## Error in findContrastSamples(input_selectContrast, colnames(convertedData.out), : object 'c.out' not found
 selectedHeatmap()   # heatmap for DEGs in selected comparison
## Error in selectedHeatmap(): object 'selectedHeatmap.data.out' not found
 # Save gene lists and data into files
 write.csv( selectedHeatmap.data()$genes, 'heatmap.data.csv') 
## Error in findContrastSamples(input_selectContrast, colnames(convertedData.out), : object 'c.out' not found
 write.csv(DEG.data(),'DEG.data.csv' )
 write(AllGeneListsGMT() ,'AllGeneListsGMT.gmt')
 input_selectGO2 <- 'GOBP'  #Gene set category 
 geneListData.out <- geneListData()  
 volcanoPlot()  

  scatterPlot()  
## Error in findContrastSamples(input_selectContrast, colnames(convertedData.out), : object 'c.out' not found
  MAplot()  
## Error in findContrastSamples(input_selectContrast, colnames(convertedData.out), : object 'c.out' not found
  geneListGOTable.out <- geneListGOTable()  
## Error in geneListGOTable(): object 'selectedHeatmap.data.out' not found
 # Read pathway data again 
 GeneSets.out <-readGeneSets( geneSetFile,
    convertedData.out, input_selectGO2,input_selectOrg,
    c(input_minSetSize, input_maxSetSize)  ) 
 input_removeRedudantSets <- TRUE   #Remove highly redundant gene sets? 
 results <- geneListGO()  #Enrichment analysis
## Error in geneListGO(): object 'geneListGOTable.out' not found
 results$adj.Pval <- format( results$adj.Pval,digits=3 )
 kable( results, row.names=FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%") 
Cluster adj.Pval Genes Pathways
A 7.99e-122 219 Organonitrogen compound biosynthetic process
2.84e-118 155 Translation
4.26e-118 155 Peptide biosynthetic process
1.81e-116 159 Amide biosynthetic process
3.07e-113 156 Peptide metabolic process
1.22e-106 160 Cellular amide metabolic process
5.05e-51 157 Response to abiotic stimulus
4.90e-42 57 Photosynthesis
5.59e-35 50 Response to cytokinin
7.30e-34 63 Generation of precursor metabolites and energy
B 9.12e-48 81 Cell wall organization or biogenesis
9.12e-48 73 Cell wall organization
2.49e-47 74 External encapsulating structure organization
1.20e-40 128 Response to abiotic stimulus
2.58e-36 74 Drug metabolic process
8.64e-36 111 Small molecule metabolic process
2.30e-34 82 Carbohydrate metabolic process
2.27e-32 77 Response to inorganic substance
6.60e-28 42 Plant-type cell wall organization or biogenesis
1.13e-27 51 Polysaccharide metabolic process
C 3.98e-50 79 Response to external stimulus
1.34e-42 63 Response to external biotic stimulus
1.34e-42 63 Response to other organism
2.33e-42 63 Response to biotic stimulus
4.56e-40 42 Secondary metabolic process
1.25e-38 63 Defense response
2.03e-38 67 Multi-organism process
2.35e-35 78 Response to abiotic stimulus
4.27e-34 26 Indole-containing compound metabolic process
1.99e-33 49 Defense response to other organism
D 1.20e-50 151 Response to abiotic stimulus
1.95e-47 127 Response to oxygen-containing compound
2.73e-46 98 Response to inorganic substance
1.10e-39 101 Response to acid chemical
2.11e-39 113 Cellular response to chemical stimulus
6.77e-37 126 Response to organic substance
1.17e-31 107 Response to hormone
4.57e-31 107 Response to endogenous stimulus
3.63e-27 75 Response to external biotic stimulus
3.63e-27 75 Response to other organism

STRING-db API access. We need to find the taxonomy id of your species, this used by STRING. First we try to guess the ID based on iDEP’s database. Users can also skip this step and assign NCBI taxonomy id directly by findTaxonomyID.out = 10090 # mouse 10090, human 9606 etc.

 STRING10_species = read.csv(STRING10_speciesFile)  
 ix = grep('Arabidopsis thaliana', STRING10_species$official_name ) 
 findTaxonomyID.out <- STRING10_species[ix,1] # find taxonomyID
 findTaxonomyID.out  
## [1] 3702

Enrichment analysis using STRING

  STRINGdb_geneList.out <- STRINGdb_geneList() #convert gene lists
## Warning:  we couldn't map to STRING 0% of your identifiers
 input_STRINGdbGO <- 'Process'  #'Process', 'Component', 'Function', 'KEGG', 'Pfam', 'InterPro' 
 results <- stringDB_GO_enrichmentData()  # enrichment using STRING 
## Error in stringDB_GO_enrichmentData(): object 'selectedHeatmap.data.out' not found
 results$adj.Pval <- format( results$adj.Pval,digits=3 )
 kable( results, row.names=FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%") 
Cluster adj.Pval Genes Pathways
A 7.99e-122 219 Organonitrogen compound biosynthetic process
2.84e-118 155 Translation
4.26e-118 155 Peptide biosynthetic process
1.81e-116 159 Amide biosynthetic process
3.07e-113 156 Peptide metabolic process
1.22e-106 160 Cellular amide metabolic process
5.05e-51 157 Response to abiotic stimulus
4.90e-42 57 Photosynthesis
5.59e-35 50 Response to cytokinin
7.30e-34 63 Generation of precursor metabolites and energy
B 9.12e-48 81 Cell wall organization or biogenesis
9.12e-48 73 Cell wall organization
2.49e-47 74 External encapsulating structure organization
1.20e-40 128 Response to abiotic stimulus
2.58e-36 74 Drug metabolic process
8.64e-36 111 Small molecule metabolic process
2.30e-34 82 Carbohydrate metabolic process
2.27e-32 77 Response to inorganic substance
6.60e-28 42 Plant-type cell wall organization or biogenesis
1.13e-27 51 Polysaccharide metabolic process
C 3.98e-50 79 Response to external stimulus
1.34e-42 63 Response to external biotic stimulus
1.34e-42 63 Response to other organism
2.33e-42 63 Response to biotic stimulus
4.56e-40 42 Secondary metabolic process
1.25e-38 63 Defense response
2.03e-38 67 Multi-organism process
2.35e-35 78 Response to abiotic stimulus
4.27e-34 26 Indole-containing compound metabolic process
1.99e-33 49 Defense response to other organism
D 1.20e-50 151 Response to abiotic stimulus
1.95e-47 127 Response to oxygen-containing compound
2.73e-46 98 Response to inorganic substance
1.10e-39 101 Response to acid chemical
2.11e-39 113 Cellular response to chemical stimulus
6.77e-37 126 Response to organic substance
1.17e-31 107 Response to hormone
4.57e-31 107 Response to endogenous stimulus
3.63e-27 75 Response to external biotic stimulus
3.63e-27 75 Response to other organism

PPI network retrieval and analysis

 input_nGenesPPI <- 100 #Number of top genes for PPI retrieval and analysis 
 stringDB_network1(1) #Show PPI network 

Generating interactive PPI

 write(stringDB_network_link(), 'PPI_results.html') # write results to html file 
## Warning: 'string_db$get_link' is deprecated.
## Use 'Contact developers to request functionality' instead.
## See help("Deprecated")
## Warning:  we couldn't map to STRING 0% of your identifiers
## Warning: 'string_db$get_link' is deprecated.
## Use 'Contact developers to request functionality' instead.
## See help("Deprecated")

## Warning: 'string_db$get_link' is deprecated.
## Use 'Contact developers to request functionality' instead.
## See help("Deprecated")
 browseURL('PPI_results.html') # open in browser 

8. Pathway analysis

 input_selectContrast1 <- 'Microgravity-Terrestrial'    #select Comparison 
 #input_selectContrast1 = limma.out$comparisons[3] # manually set
 input_selectGO <- 'GOBP'   #Gene set category 
 #input_selectGO='custom' # if custom gmt file
 input_minSetSize <- 15 #Min size for gene set
 input_maxSetSize <- 2000   #Max size for gene set 
 # Read pathway data again 
 GeneSets.out <-readGeneSets( geneSetFile,
    convertedData.out, input_selectGO,input_selectOrg,
    c(input_minSetSize, input_maxSetSize)  ) 
 input_pathwayPvalCutoff <- 0.2 #FDR cutoff
 input_nPathwayShow <- 30   #Top pathways to show
 input_absoluteFold <- FALSE    #Use absolute values of fold-change?
 input_GenePvalCutoff <- 1  #FDR to remove genes 

 input_pathwayMethod = 1  # 1  GAGE
 gagePathwayData.out <- gagePathwayData()  # pathway analysis using GAGE  
   
 results <- gagePathwayData.out  #Enrichment analysis for k-Means clusters  
 results$adj.Pval <- format( results$adj.Pval,digits=3 )
 kable( results, row.names=FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%") 
Direction GAGE analysis: Microgravity vs Terrestrial statistic Genes adj.Pval
Down Response to chitin -7.1354 104 7.7e-09
Secondary metabolic process -7.1074 265 4.0e-09
Response to drug -6.7158 482 1.0e-08
Secondary metabolite biosynthetic process -6.301 118 3.6e-07
Response to organonitrogen compound -5.9475 212 1.1e-06
Response to nitrogen compound -5.8753 271 1.2e-06
Response to fungus -5.8243 263 1.3e-06
Up Photosynthesis 10.9155 223 1.0e-21
Ribosome biogenesis 10.0693 342 2.4e-19
Ribonucleoprotein complex biogenesis 9.7947 437 7.8e-19
Plastid organization 9.3709 257 5.5e-17
NcRNA metabolic process 9.1087 425 1.4e-16
NcRNA processing 8.4468 357 3.5e-14
RRNA processing 8.1361 239 7.5e-13
RRNA metabolic process 8.1082 244 7.5e-13
Photosynthesis, light reaction 8.045 119 5.5e-12
Chloroplast organization 7.7787 198 6.1e-12
Generation of precursor metabolites and energy 7.5335 391 1.2e-11
RNA modification 7.1399 321 2.5e-10
Thylakoid membrane organization 7.0653 46 6.2e-08
Photosynthetic electron transport chain 6.25 46 1.1e-06
Tetrapyrrole biosynthetic process 6.1908 70 4.3e-07
Porphyrin-containing compound biosynthetic process 6.0376 67 8.2e-07
Tetrapyrrole metabolic process 6.0301 93 5.5e-07
Ribosome assembly 6.009 76 1.2e-06
Ribosomal large subunit biogenesis 5.9893 99 8.4e-07
Chlorophyll biosynthetic process 5.9497 58 1.3e-06
Porphyrin-containing compound metabolic process 5.9492 92 7.9e-07
Nucleoside monophosphate metabolic process 5.748 193 1.0e-06
Protein transmembrane transport 5.7225 112 1.5e-06
 pathwayListData.out = pathwayListData() 
 enrichmentPlot(pathwayListData.out, 25  ) 

  enrichmentNetwork(pathwayListData.out )  

  enrichmentNetworkPlotly(pathwayListData.out) 

## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
 input_pathwayMethod = 3  # 1  fgsea 
 fgseaPathwayData.out <- fgseaPathwayData() #Pathway analysis using fgsea 
## Warning in fgsea(pathways = gmt, stats = fold, minSize = input_minSetSize, :
## You are trying to run fgseaSimple. It is recommended to use fgseaMultilevel. To
## run fgseaMultilevel, you need to remove the nperm argument in the fgsea function
## call.
## Warning in fgseaSimple(...): There were 14 pathways for which P-values were not
## calculated properly due to unbalanced gene-level statistic values
 results <- fgseaPathwayData.out  #Enrichment analysis for k-Means clusters 
 results$adj.Pval <- format( results$adj.Pval,digits=3 )
 kable( results, row.names=FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%") 
Direction GSEA analysis: Microgravity vs Terrestrial NES Genes adj.Pval
Up Photosynthesis 3.4772 223 5.9e-02
Photosynthesis, light reaction 3.1308 119 1.6e-02
Thylakoid membrane organization 3.1297 46 6.4e-03
Plastid organization 3.0452 257 9.0e-02
Tetrapyrrole biosynthetic process 2.9993 70 8.8e-03
Chlorophyll biosynthetic process 2.962 58 7.3e-03
Porphyrin-containing compound biosynthetic process 2.9378 67 8.5e-03
Photosynthetic electron transport chain 2.9018 46 6.4e-03
Ribosome assembly 2.857 76 9.3e-03
Plastid membrane organization 2.837 49 6.6e-03
Chloroplast organization 2.8187 198 4.1e-02
Ribosomal large subunit biogenesis 2.7811 99 1.3e-02
Tetrapyrrole metabolic process 2.7695 93 1.2e-02
Protein localization to chloroplast 2.7606 45 6.3e-03
RRNA processing 2.7591 239 7.0e-02
Porphyrin-containing compound metabolic process 2.7489 92 1.2e-02
Chlorophyll metabolic process 2.7388 81 9.8e-03
Ribosomal small subunit assembly 2.7092 30 5.4e-03
Protein targeting to chloroplast 2.706 43 6.3e-03
Establishment of protein localization to chloroplast 2.706 43 6.3e-03
RRNA metabolic process 2.6926 244 8.1e-02
Chloroplast rRNA processing 2.6858 18 4.7e-03
Chloroplast RNA processing 2.6656 19 4.7e-03
Photosystem II assembly 2.6213 25 4.9e-03
Pigment biosynthetic process 2.6118 130 1.9e-02
Mitochondrial gene expression 2.5889 54 7.0e-03
Mitochondrial translation 2.586 31 5.4e-03
Ribosomal small subunit biogenesis 2.5738 79 9.6e-03
Protein transmembrane transport 2.5411 112 1.5e-02
Starch metabolic process 2.5385 60 7.5e-03
  pathwayListData.out = pathwayListData() 
 enrichmentPlot(pathwayListData.out, 25  ) 

  enrichmentNetwork(pathwayListData.out )  

  enrichmentNetworkPlotly(pathwayListData.out) 

   PGSEAplot() # pathway analysis using PGSEA 
## Error in findContrastSamples(input_selectContrast1, colnames(convertedData.out), : object 'c.out' not found

9. Chromosome

 input_selectContrast2 <- 'Microgravity-Terrestrial'    #select Comparison 
 #input_selectContrast2 = limma.out$comparisons[3] # manually set
 input_limmaPvalViz <- 0.1  #FDR to filter genes
 input_limmaFCViz <- 2  #FDR to filter genes 
 genomePlotly() # shows fold-changes on the genome 
## Warning in eval(quote(list(...)), env): NAs introduced by coercion
## Warning in genomePlotly(): NAs introduced by coercion

10. Biclustering

 input_nGenesBiclust <- 1000    #Top genes for biclustering
 input_biclustMethod <- 'BCCC()'    #Method: 'BCCC', 'QUBIC', 'runibic' ... 
 biclustering.out = biclustering()  # run analysis

 input_selectBicluster <- 1 #select a cluster 
 biclustHeatmap()   # heatmap for selected cluster 

 input_selectGO4 <- 'GOBP'  #Gene set category 
 # Read pathway data again 
 GeneSets.out <-readGeneSets( geneSetFile,
    convertedData.out, input_selectGO4,input_selectOrg,
    c(input_minSetSize, input_maxSetSize)  )  
 results <- geneListBclustGO()  #Enrichment analysis for k-Means clusters   
 results$adj.Pval <- format( results$adj.Pval,digits=3 )
 kable( results, row.names=FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%") 
adj.Pval Genes Pathways
2.4e-134 308 Response to abiotic stimulus
5.9e-86 170 Response to inorganic substance
1.1e-66 216 Response to organic substance
5.7e-64 182 Oxidation-reduction process
1.3e-63 197 Organonitrogen compound biosynthetic process
2.2e-63 173 Response to external stimulus
4.5e-62 144 Response to external biotic stimulus
4.5e-62 144 Response to other organism
2.6e-61 144 Response to biotic stimulus
3.1e-61 105 Response to metal ion

11. Co-expression network

 input_mySoftPower <- 5 #SoftPower to cutoff
 input_nGenesNetwork <- 1000    #Number of top genes
 input_minModuleSize <- 20  #Module size minimum 
 wgcna.out = wgcna()   # run WGCNA  
## Warning: executing %dopar% sequentially: no parallel backend registered
##    Power SFT.R.sq  slope truncated.R.sq mean.k. median.k. max.k.
## 1      1   0.6880  3.020          0.962  418.00    426.00  557.0
## 2      2   0.5030  1.310          0.949  240.00    241.00  388.0
## 3      3   0.2260  0.549          0.896  159.00    155.00  294.0
## 4      4   0.0337  0.163          0.868  113.00    108.00  233.0
## 5      5   0.0639 -0.187          0.865   85.00     79.00  191.0
## 6      6   0.2370 -0.350          0.875   66.30     60.10  159.0
## 7      7   0.4190 -0.473          0.908   53.20     46.40  135.0
## 8      8   0.5660 -0.576          0.946   43.70     37.00  117.0
## 9      9   0.6550 -0.661          0.944   36.50     30.10  103.0
## 10    10   0.7000 -0.706          0.948   30.90     24.80   91.1
## 11    12   0.7590 -0.822          0.928   23.00     17.50   73.4
## 12    14   0.7970 -0.918          0.922   17.70     12.80   60.7
## 13    16   0.8360 -0.968          0.932   14.00      9.68   51.3
## 14    18   0.8470 -1.020          0.929   11.40      7.55   44.1
## 15    20   0.8340 -1.050          0.905    9.39      5.96   38.5
## TOM calculation: adjacency..
## ..will not use multithreading.
##  Fraction of slow calculations: 0.000000
## ..connectivity..
## ..matrix multiplication (system BLAS)..
## ..normalization..
## ..done.
 softPower()  # soft power curve 

  modulePlot()  # plot modules  

  listWGCNA.Modules.out = listWGCNA.Modules() #modules
 input_selectGO5 <- 'GOBP'  #Gene set category 
 # Read pathway data again 
 GeneSets.out <-readGeneSets( geneSetFile,
    convertedData.out, input_selectGO5,input_selectOrg,
    c(input_minSetSize, input_maxSetSize)  ) 
 input_selectWGCNA.Module <- 'Entire network'   #Select a module
 input_topGenesNetwork <- 10    #SoftPower to cutoff
 input_edgeThreshold <- 0.4 #Number of top genes 
 moduleNetwork()    # show network of top genes in selected module
##  softConnectivity: FYI: connecitivty of genes with less than 4 valid samples will be returned as NA.
##  ..calculating connectivities..

 input_removeRedudantSets <- TRUE   #Remove redundant gene sets 
 results <- networkModuleGO()  #Enrichment analysis of selected module
 results$adj.Pval <- format( results$adj.Pval,digits=3 )
 kable( results, row.names=FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%") 
adj.Pval Genes Pathways
2.4e-134 308 Response to abiotic stimulus
5.9e-86 170 Response to inorganic substance
1.1e-66 216 Response to organic substance
5.7e-64 182 Oxidation-reduction process
1.3e-63 197 Organonitrogen compound biosynthetic process
2.2e-63 173 Response to external stimulus
4.5e-62 144 Response to external biotic stimulus
4.5e-62 144 Response to other organism
2.6e-61 144 Response to biotic stimulus
3.1e-61 105 Response to metal ion